Aaron emailed me back with advice on a way to make eggNog-mapper/2.1.12 on hawk work. As i am revising for an exam i cant spend too long on this. p.s. i think i did pretty good on the exam
From the 10th to the 12th i tried running a [[test
file]] i made earlier in december 2024, i may not have written
about it as i didnt get anywhere at that time. This process was also
hindered by hawk being very encumbered in the new year meaning it takes
a good half day for any results to appear. On the 11th i managed to get
a successful result with the .xlsx file i need for the heatmaps for
accession 3Dt1c. Today(the 12th) i set off 3 jobs each containing 3
accessions to hopefully get the results of the remaining 9. If there are
no complications i will then be in a place where i can obtain the
.fastas for the online comparison accessions and then run them through
eggNog-mapper. It should be noted that i used the same list of
parameters i found on the web version of hawk for this,
[[[[[[[[[[screenshot attached]]]]]]]]]]]]. I had some
marginal success, the second set of three finished in over 3 hours, the
other 2 sets are still going strong after 8, so ill let them time out
over night and see what i have, maybe they will complete, i did give
them 12 hours.I then discovered the command dbmem which
could help to speed up the process. I tested with [[this
file]] set to run over night from roughly 9 30 pm on the 12th
to 3 30 am on the 13th, totalling 6 hours for 5 files, not great. I then
experimented with taking out some of the arguments
--evalue 0.001 --score 60 --pident 40 --query_cover 20 --subject_cover 20,
that produced [[this script]]. Tomorrow(14th) i will
have a look at recreating run 1 with dbmem switched on.
Run one had mixed results, 3 sets of 3 ran in parallel, set 1 completed in 3 and a half hours, good. set 2 took 8 and a half hours, bad, set 3 timed out, very bad. so dont have the outputs needed for 1Dt100h or 1Dt1h. Run 2 was more successful, with 5 solid looking outputs in 6 hours. Run 3 did not improve on that time despite the extra parameters being cut.
It could have been overcrowding on hawk, however it appears that
running multiple sets in parallel adversely affects the result, however,
i have not tested this with dbmem on. The large problem is
the volume of samples required, the desired output is 3 heatmaps
comparing KO pathways: * Comparing genera inside sphingomonadaceae *
Comparing genera inside Microbacteriaceae * Comparing the genera
containing just our samples
| genus | n | |
|---|---|---|
| f__Sphingomonadaceae | ||
| g__34-65-8 | 1 | |
| g__Actirhodobacter | 1 | |
| g__Alg239-R122 | 1 | |
| g__Allosphingosinicella | 20 | |
| g__Alteraurantiacibacter | 20 | |
| g__Altererythrobacter_D | 2 | |
| g__Altererythrobacter_F | 1 | |
| g__Altericroceibacterium | 3 | |
| g__Altericroceibacterium_A | 1 | |
| g__Alteripontixanthobacter | 1 | |
| g__Alteriqipengyuania | 9 | |
| g__Alteriqipengyuania_A | 2 | |
| g__Blastomonas | 8 | |
| g__CADCVW01 | 1 | |
| g__CAHJWT01 | 4 | |
| g__CFH-75059 | 1 | |
| g__Caenibius | 5 | |
| g__Chakrabartia | 9 | |
| g__Croceibacterium | 15 | |
| g__Croceicoccus | 10 | |
| g__Erythrobacter | 66 | |
| g__GCA-014117445 | 1 | |
| g__Glacieibacterium | 2 | |
| g__Hankyongella | 1 | |
| g__JACXVD01 | 1 | |
| g__Novosphingobium | 115 | |
| g__Novosphingopyxis | 2 | |
| g__Pacificimonas | 4 | |
| g__Parapontixanthobacter | 1 | |
| g__Parasphingopyxis | 7 | |
| g__Parasphingorhabdus | 18 | |
| g__Paraurantiacibacter | 1 | |
| g__Parerythrobacter | 2 | |
| g__Pelagerythrobacter | 5 | |
| g__Polymorphobacter | 9 | |
| g__Polymorphobacter_A | 1 | |
| g__Pontixanthobacter | 6 | |
| g__Pseudopontixanthobacter | 2 | |
| g__Pseudopontixanthobacter_A | 2 | |
| g__QFOP01 | 1 | |
| g__Qipengyuania | 29 | |
| g__Rhizorhabdus | 14 | |
| g__Rhizorhapis | 2 | |
| g__SCN-67-18 | 1 | |
| g__Sandaracinobacter | 4 | |
| g__Sandarakinorhabdus | 7 | |
| g__Sphingobium | 77 | |
| g__Sphingobium_A | 2 | |
| g__Sphingomicrobium | 38 | |
| g__Sphingomonas | 205 | |
| g__Sphingomonas_B | 6 | |
| g__Sphingomonas_D | 1 | |
| g__Sphingomonas_E | 3 | |
| g__Sphingomonas_G | 5 | |
| g__Sphingomonas_H | 1 | |
| g__Sphingomonas_I | 5 | |
| g__Sphingomonas_K | 1 | |
| g__Sphingomonas_L | 2 | |
| g__Sphingomonas_M | 1 | |
| g__Sphingomonas_N | 6 | |
| g__Sphingopyxis | 62 | |
| g__Sphingorhabdus_B | 25 | |
| g__Sphingorhabdus_C | 2 | |
| g__Sphingosinicella | 4 | |
| g__Tardibacter | 1 | |
| g__Thermaurantiacus | 1 | |
| g__Tsuneonella | 10 | |
| g__UBA1936 | 3 | |
| g__UBA6174 | 2 | |
| g__XMGL2 | 1 | |
| g__ZODW24 | 1 | |
| g__Zymomonas | 3 | |
| Total | — | 887 |
| genus | n | |
|---|---|---|
| f__Microbacteriaceae | ||
| g__73-13 | 2 | |
| g__Agreia | 6 | |
| g__Agrococcus | 16 | |
| g__Agromyces | 41 | |
| g__Agromyces_B | 1 | |
| g__Alpinimonas | 1 | |
| g__Amnibacterium | 2 | |
| g__Aquiluna | 15 | |
| g__Aurantimicrobium | 3 | |
| g__CAIOLM01 | 1 | |
| g__Canibacter | 4 | |
| g__Chryseoglobus | 8 | |
| g__Clavibacter | 17 | |
| g__Cnuibacter | 1 | |
| g__Compostimonas | 1 | |
| g__Conyzicola | 3 | |
| g__Cryobacterium | 43 | |
| g__Cryobacterium_C | 1 | |
| g__Curtobacterium | 51 | |
| g__Cx-87 | 1 | |
| g__Diaminobutyricibacter | 1 | |
| g__Diaminobutyricimonas | 2 | |
| g__Frigoribacterium | 15 | |
| g__Frondihabitans | 5 | |
| g__Galbitalea | 2 | |
| g__Glaciibacter | 1 | |
| g__Glaciihabitans | 2 | |
| g__Gryllotalpicola | 3 | |
| g__Gulosibacter | 9 | |
| g__Herbiconiux | 7 | |
| g__Homoserinimonas | 4 | |
| g__Humibacter | 4 | |
| g__JAAFHU01 | 1 | |
| g__JAFIQW01 | 1 | |
| g__Klugiella | 1 | |
| g__Labedella | 4 | |
| g__Lacisediminihabitans | 5 | |
| g__Leifsonia | 19 | |
| g__Leifsonia_A | 4 | |
| g__Leifsonia_B | 1 | |
| g__Leucobacter | 43 | |
| g__Lumbricidophila | 1 | |
| g__Lysinibacter | 1 | |
| g__MWH-TA3 | 7 | |
| g__Marinisubtilis | 3 | |
| g__Marisediminicola | 4 | |
| g__Microbacterium | 254 | |
| g__Microbacterium_A | 4 | |
| g__Microcella | 3 | |
| g__Microterricola | 7 | |
| g__Mycetocola | 3 | |
| g__Mycetocola_A | 5 | |
| g__Mycetocola_B | 1 | |
| g__NC76-1 | 1 | |
| g__Naasia | 4 | |
| g__OACT-916 | 1 | |
| g__Okibacterium | 2 | |
| g__Planctomonas | 2 | |
| g__Plantibacter | 6 | |
| g__Pontimonas | 10 | |
| g__Protaetiibacter | 9 | |
| g__Pseudoclavibacter | 9 | |
| g__Pseudoclavibacter_A | 3 | |
| g__Pseudolysinimonas | 5 | |
| g__RFQD01 | 2 | |
| g__Rathayibacter | 22 | |
| g__Rhodoglobus | 15 | |
| g__Rhodoluna | 35 | |
| g__Root112D2 | 1 | |
| g__SCRE01 | 1 | |
| g__Schumannella | 4 | |
| g__Subtercola | 9 | |
| g__Terrimesophilobacter | 3 | |
| g__Tropheryma | 1 | |
| g__UBA3913 | 2 | |
| g__UBA963 | 5 | |
| g__WSTA01 | 2 | |
| g__Yonghaparkia | 5 | |
| g__ZJ450 | 2 | |
| Total | — | 806 |
| genus | n | |
|---|---|---|
| g__ | 1 | |
| g__Brachybacterium | 32 | |
| g__Brevibacterium | 43 | |
| g__Microbacterium | 254 | |
| g__Pantoea | 52 | |
| g__Sphingomonas | 205 | |
| Total | — | 587 |
📌 ?: TODO: [go and compare the content of runs 2 and 3] [eggnog-mapper on hawk is slow, possibly too slow to scale to where we need it to be. alternatives: > what scale are we looking at - 1693 samples for the family comparison > could limit to genera with more than 30 samples > screenscrape-method > enlist more manpower to do online(if we have to do on web)]